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Data clusterin g ; a review 
A. K. Jain, M. N. Murty, P. J. Flynn 

September 1999 ACM Computing Surveys (CSUR), volume 31 issue 3 
Publisher: ACM Press 

Additional Information: full citation, abstract, reference s, citings, index 
terms, review 



Full text available: |g| pdf( 636.24 KB ) 



Clustering is the unsupervised classification of patterns (observations, data items, or 
feature vectors) into groups (clusters). The clustering problem has been addressed in 
many contexts and by researchers in many disciplines; this reflects its broad appeal and 
usefulness as one of the steps in exploratory data analysis. However, clustering is a 
difficult problem combinatorially, and differences in assumptions and contexts in different 
communities has made the transfer of useful generic co ... 

Keywords: cluster analysis, clustering applications, exploratory data analysis, 
incremental clustering, similarity indices, unsupervised learning 



2 Special issue on sp ecia l feature: Distributional word clusters vs. words for text 
categoriz ation 

Ron Bekkerman, Ran El-Yaniv, Naftali Tishby, Yoad Winter 

March 2003 The Journal of Machine Learning Research, volume 3 

Publisher: MIT Press 

Full text available: ^p df ( 176.53 K B) Additional Information: full citation, abstract, citings, index terms 

We study an approach to text categorization that combines distributional clustering of 
words and a Support Vector Machine (SVM) classifier. This word-cluster representation is 
computed using the recently introduced Information Bottleneck method, which generates 
a compact and efficient representation of documents. When combined with the 
classification power of the SVM, this method yields high performance in text 
categorization. This novel combination of SVM with word-cluster representation ... 

3 Feature-based similarity search in graph structures 
^ Xifeng Yan, Feida Zhu, Philip S. Yu, Jiawei Han 

V December 2006 ACM Transactions on Database Systems (TODS), volume 31 issue 4 
Publisher: ACM Press 

Full text available: ^| pdf( 1. 44 MB) Additional Information: full citation , abstract , references , index terms 
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Similarity search of complex structures is an important operation in graph-related 
applications since exact matching is often too restrictive. In this article, we investigate the 
issues of substructure similarity search using indexed features in graph databases. By 
transforming the edge relaxation ratio of a query graph into the maximum allowed feature 
misses, our structural filtering algorithm can filter graphs without performing pairwise 
similarity computation. It is further shown that ... 

Keywords: Graph database, complexity, index, similarity search 



4 S pecial issue on ICML: Coupled clusterin g : a method for detecting structural 
corres pondence 

Zvika Marx, Ido Dagan, Joachim M. Buhmann, Eli Shamir 

March 2003 The Journal of Machine Learning Research, volume 3 

Publisher: MIT Press 

Full text available: ^| pdf (967.15 KB) Additional Information: full citation , abstract , citings, index terms 

This paper proposes a new paradigm and a computational framework for revealing 
equivalencies (analogies) between sub-structures of distinct composite systems that are 
initially represented by unstructured data sets. For this purpose, we introduce and 
investigate a variant of traditional data clustering, termed coupled clustering, which 
outputs a configuration of corresponding subsets of two such representative sets. We 
apply our method to synthetic as well as textual data. Its achievement ... 

5 Research track: Visualizing changes in the structure of data for exploratory feature 
i& selection 

Elias Pampalk, Werner Goebl, Gerhard Widmer 

August 2003 Proceedings of the ninth ACM SIGKDD international conference on 
Knowledge discovery and data mining KDD '03 

Publisher: ACM Press 

Full text available* 1*1 pdf(642 .44 KB) Additional Information: full citation, abstract, references, citings, index 
. |aJ. _ terms 

Using visualization techniques to explore and understand high-dimensional data is an 
efficient way to combine human intelligence with the immense brute force computation 
power available nowadays. Several visualization techniques have been developed to study 
the cluster structure of data, i.e., the existence of distinctive groups in the data and how 
these clusters are related to each other. However, only few of these techniques lend 
themselves to studying how this structure changes if the feature ... 

Keywords: high-dimensional data, interactive data mining 



6 Content 2: image clustering: Web image clustering by consistent utilization of visual j 
S> features and surrounding texts 

^ Bin Gao, Tie-Yan Liu, Tao Qin, Xin Zheng, Qian-Sheng Cheng, Wei-Ying Ma 

November 2005 Proceedings of the 13th annual ACM international conference on 

Multimedia MULTIMEDIA '05 
Publisher: ACM Press 

Full text available: ^pdf {1.23 M B) Additional Information: full citation, abstract, references, index t erms 

Image clustering, an important technology for image processing, has been actively 
researched for a long period of time. Especially in recent years, with the explosive growth 
of the Web, image clustering has even been a critical technology to help users digest the 
large amount of online visual information. However, as far as we know, many previous 
works on image clustering only used either low-level visual features or surrounding texts, 
but rarely exploited these two kinds of information in the s ... 
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Semanti c clus t e ri n g and querying on hetero g eneous features for visual data 
Gholamhosein Sheikholeslami, Wendy Chang, Aidbng Zhang 

September 1998 Proceedings of the sixth ACM international conference on Multimedia 
MULTIMEDIA '98 

Publisher: ACM Press 

Full text available: ^ pdf(1.37 MB) Additional Information: fu ll citation , references , citin gs, i ndex terms 



Clustering and searching WWW images using link and page layout analysis 

Xiaofei He, Deng Cai, Ji-Rong Wen, Wei-Ying Ma, Hong-Jiang Zhang 

May 2007 ACM Transactions on Multimedia Computing, Communications, and 

Applications (TOMCCAP), Volume 3 Issue 2 
Publisher: ACM Press 

Full text available: ^|pdf(28.98 MB) Additional Information: full citation , abstract, refere nces , index ter ms 

Due to the rapid growth of the number of digital images on the Web, there is an 
increasing demand for an effective and efficient method for organizing and retrieving the 
available images. This article describes iFind, a system for clustering and searching WWW 
images. By using a vision-based page segmentation algorithm, a Web page is partitioned 
into blocks, and the textual and link information of an image can be accurately extracted 
from the block containing that image. The textual informatio ... 

Keywords: Web mining, image clustering, image search, link analysis 



9 Ap plications: Fast retrieval of high-dimensional feature vectors in P2P networks using 
g|> compact peer data summaries 

^ Wolfgang Muller; Andreas Henrich 

November 2003 Proceedings of the 5th ACM SIGMM international workshop on 

Multimedia information retrieval MIR '03 
Publisher: ACM Press 

Full text available* W Ddf(378 07 KB) Additional Information: full citation , abstrac t, refere n ces, citing s, index 
' ™ " terms 

The retrieval facilities of most Peer-to-Peer (P2P) systems are limited to queries based on 
a unique identifier or a small set of keywords. The techniques used for this purpose are 
hardly applicable for content-based image retrieval (CBIR) in a P2P network. Furthermore, 
we will argue that the curse of dimensionality and the high communication overhead 
prevent the adaptation of multidimensional search trees or fast sequential scan 
techniques for P2P CBIR. In the present paper we will propose two ... 

1 0 Description and Analysis: Using web structure for classifying and describing web 
paiges 

Eric J. Glover, Kostas Tsioutsiouliklis, Steve Lawrence, David M. Pennock, Gary W. Flake 
May 2002 Proceedings of the 11th international conference on World Wide Web 

WWW '02 
Publisher: ACM Press 

Full text available* fj!|pdf(1 36 12 KB) Addit ' ona l Information: full citation, abstract, references, citings, index 
LJ : terms 

The structure of the web is increasingly being used to improve organization, search, and 
analysis of information on the web. For example, Google uses the text in citing documents 
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(documents that link to the target document) for search. We analyze the relative utility of 
document text, and the text in citing documents near the citation,, for classification and 
description. Results show that the text in citing documents, when available, often has 
greater discriminative and descriptive power than th ... 

Keywords: SVM, anchortext, classification, .cluster naming, entropy based feature 
extraction, evaluation, web directory, web structure 



1 1 Research track pa per: Consistent b ip artite gra ph co-partitionin g for star-structured 
<g> high -order hetero geneous data co- clust e r i n g 
^ Bin Gao, Tie-Yan Liu, Xin Zheng, Qian-Sheng Cheng, Wei-Ying Ma 

August 2005 Proceeding of the eleventh ACM SIGKDD international conference on 
Knowledge discovery in data mining KDD '05 

Publisher: ACM Press 

Full text available* f*] Ddf(560 22 KB) Additlona l Information: full citation, abstract, references, citings, index 
• |a] terms 

Heterogeneous data co-clustering has attracted more and more attention in recent years 
due to its high impact on various applications. While the co-clustering algorithms for two 
types of heterogeneous data (denoted by pair-wise co-clustering), such as documents and 
terms, have been well studied in the literature, the work on more types of heterogeneous 
data (denoted by high-order co-clustering) is still very limited. As an attempt in this 
direction, in this paper, we worked on a specific case of ... 

Keywords: co-clustering, consistency, high-order heterogeneous data, spectral graph 



12 Spectral clustering for mult i-type relational data 

Bo Long, Zhongfei (Mark) Zhang, Xiaoyun Wu, Philip S. Yu 

June 2006 Proceedings of the 23rd international conference on Machine learning 
ICML '06 

Publisher: ACM Press 

Full text available* f 8 ! pdf(240 84 KB) Addlt ' onal Information: full citation, abs tr act , references, cited by, index 
■ [a] ■_ terms 

Clustering on multi-type relational data has attracted more and more attention in recent 
years due to its high impact on various important applications, such as Web mining, e- 
commerce and bioinformatics. However, the research on general multi-type relational 
data clustering is still limited and preliminary. The contribution of the paper is three-fold. 
First, we propose a general model, the collective factorization on related matrices, for 
multi-type relational data clustering. The model is appli ... 



1 3 Concept features in Re:Agent an intelligent Email agent 
Gary Boone 

May 1998 Proceedings of the second international conference on Autonomous 
agents AGENTS '98 

Publisher: ACM Press 

Full text available: ^g] pdf(1.Q7 MB) Additional Information: full citation , references , citings, index terms 




14 Research sessions: Research 7: Ind exing : FIX: feature-based indexin g te ch niq ue for Q 
XML documents 

Ning Zhang, M. Tamer Ozsu, Ihab F. Ilyas, Ashraf Aboulnaga 

September 2006 Proceedings of the 32nd international conference on Very large data 
bases VLDB '06 
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Publisher: VLDB Endowment 

Full text available: ^£|pdf(965.30 KB) Additional Information: fu ll citation , abs trac t, refer ence s, index terms 

Indexing large XML databases is crucial for efficient evaluation of XML twig queries. In this 
paper, we propose a feature-based indexing technique, called FIX, based on spectral 
graph theory. The basic idea is that for each twig pattern in a collection of XML 
documents, we calculate a vector of features based on its structural properties. These 
features are used as keys for the patterns and stored in a B + tree. Given an XPath query, 
its feature vector is first calculated and looked up ... 



15 Multimedia and visualization (MV): A pivot-based index structure for combination of j 

feature vectors 
^ Benjamin Bustos, Daniel Keim, Tobias Schreck 

March 2005 Proceedings of the 2005 ACM symposium on Applied computing SAC '05 

Publisher: ACM Press 

Full text available: Qp dfd 72.44 KB) Additional Information: full citation, abstract , references , index terms 

We present a novel indexing schema that provides efficient nearest-neighbor queries in 
multimedia databases consisting of objects described by multiple feature vectors. The 
benefits of the simultaneous usage of several (statically or dynamically) weighted feature 
vectors with respect to retrieval effectiveness have been previously demonstrated. 
Support for efficient multi-feature vector similarity queries is an open problem, as existing 
indexing methods do not support dynamically p ... 

Keywords: combination of features, content-based indexing and retrieval, nearest 
neighbor queries 



16 Semantic an nota tio n and integration: Web taxonomy integration using support vector | 
^ machines 

Dell Zhang, Wee Sun Lee 

May 2004 Proceedings of the 13th international conference on World Wide Web 
WWW '04 

Publisher: ACM Press 

Full text available* f" 1 ! odf(1 91 33 KB) Additional Information: full citation, ab stract, references, citings, index 
' ^ 1 terms 

We address the problem of integrating objects from a source taxonomy into a master 
taxonomy. This problem is not only currently pervasive on the web, but also important to 
the emerging semantic web. A straightforward approach to automating this process would 
be to train a classifier for each category in the master taxonomy, and then classify objects 
from the source taxonomy into these categories. In this paper we attempt to use a 
powerful classification method, Support Vector Machine (SVM), to a ... 

Keywords: classification, ontology mapping, semantic web, support vector machines, 
taxonomy integration, transductive learning 



17 Real-time shadin g 

Marc Olano, Kurt Akeley, John C. Hart, Wolfgang Heidrich, Michael McCool, Jason L Mitchell, 
Randi Rost 

August 2004 ACM SIGGRAPH 2004 Course Notes SIGGRAPH '04 

Publisher: ACM Press 

Full text available: ^ pdf(7, 3 9 MB) Additional Information: full c i ta tio n, abstract 

Real-time procedural shading was once seen as a distant dream. When the first version of 
this course was offered four years ago, real-time shading was possible, but only with one- 



http://portal.acm.org/results.cfm?coll=ACM&dl=ACM&CFID=30063308& 7/29/2007 



Results (page 1): cluster structure categories feature vector 



Page 6 of 7 



of-a-kind hardware or by combining the effects of tens to hundreds of rendering passes. 
Today, almost every new computer comes with graphics hardware capable of interactively 
executing shaders of thousands to tens of thousands of instructions. This course has been 
redesigned to address today's real-time shading capabili ... 

18 Content 2: image clustering: Iteratively clustering web images based on link and 

attribute reinforcements 
Xin-Jing Wang, Wei-Ying Ma, Lei Zhang, Xing Li 

November 2005 Proceedings of the 13th annual ACM international conference on 
Multimedia MULTIMEDIA 05 

Publisher: ACM Press 

Full text available: ^ pdf(248.Q2 KB) Additional Information: full citation , abstract , references , index terms 

Image clustering is an important research topic which contributes to a wide range of 
applications. Traditional image clustering approaches are based on image content features 
only, while content features alone can hardly describe the semantics of the images. In the 
context of Web, images are no longer assumed homogeneous and "flatdistributed but are 
richly structured. There are two kinds of reinforcements embedded in such data: 1) the 
reinforcement between attributes of different data types (int ... 

Keywords: image clustering, iterative reinforcement, link mining 



19 PageClu ste r: Mining conceptual link hierarchies from Web log files for adaptive Web 
<g> site navigation 

^ Jianhan Zhu, Jun Hong, John G. Hughes 

May 2004 ACM Transactions on Internet Technology (TOIT), volume 4 issue 2 
Publisher: ACM Press 

Full text available- pdf(280 84 KB) ^d^ 0031 information: full citation, abstract, references, citings, i ndex 
' ~ terms 

User traversals on hyperlinks between Web pages can reveal semantic relationships 
between these pages. We use user traversals on hyperlinks as weights to measure 
semantic relationships between Web pages. On the basis of these weights, we propose a 
novel method to put Web pages on a Web site onto different conceptual levels in a link 
hierarchy. We develop a clustering algorithm called PageCluster, which clusters 
conceptually-related pages on each conceptual level of the link hierarchy based on th ... 

Keywords: Link hierarchies, Web site navigation, bibliographic analysis, clustering, 
conceptual link hierarchies, link similarity 



20 A comparative s tud y for domain ontology guided feature extraction 
Bill B. Wang, R. I. Bob Mckay, Hussein A. Abbass, Michael Barlow 

February 2003 Proceedings of the 26th Australasian computer science conference - 
Volume 16 ACSC '03 

Publisher: Australian Computer Society, Inc. 

Full text available: tGl B df(1 19.73 KB). Addltional lnformation: fa" citation , SbS&a& references, citings, index 

terms 

We introduced a novel method employing a hierarchical domain ontology structure to 
extract features representing documents in our previous publication (Wang 2002). All raw 
words in the training documents are mapped to concepts in a concept hierarchy derived 
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